{
 "cells": [
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true,
    "nbsphinx": "hidden"
   },
   "outputs": [],
   "source": [
    "%matplotlib inline\n",
    "import warnings\n",
    "warnings.filterwarnings('ignore')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Changing the Linear Components\n",
    "\n",
    "Here we calculate trends using the SAGE II/OSIRIS/OMPS-LP Dataset.  Trends are calculated in the following ways\n",
    "\n",
    "* With two linear terms that have a join point at 1997.\n",
    "* Same as above but with the join point changed to 2000\n",
    "* By fitting ozone anomalies without any linear component, and then fitting the residuals to a linear term\n",
    "* Using two orthogonal forms of the EESC\n",
    "\n",
    "Start with some imports and loading in the data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "import xarray as xr\n",
    "import numpy as np\n",
    "from LOTUS_regression.regression import regress_all_bins\n",
    "from LOTUS_regression.predictors import load_data\n",
    "import LOTUS_regression.predictors.download as download\n",
    "import LOTUS_regression.plotting.trends as trends\n",
    "import matplotlib.pyplot as plt\n",
    "from datetime import datetime\n",
    "import pandas as pd\n",
    "import time"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "MERGED_FILE = r'/home/runner/work/lotus-regression/lotus-regression/test_data/S2_OS_OMPS/MERGED_LOTUS.nc'\n",
    "\n",
    "mzm_data = xr.open_dataset(MERGED_FILE, engine='netcdf4')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We have our default set of predictors which contains linear terms joining at 1997"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "predictors = load_data('pred_baseline_pwlt.csv').to_period('M')\n",
    "\n",
    "print(predictors.columns)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "And our default set of trends"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "results = regress_all_bins(predictors, mzm_data['relative_anomaly'], tolerance=0.1)\n",
    "\n",
    "# Convert to ~ percent\n",
    "results *= 100\n",
    "\n",
    "trends.pre_post_with_confidence(results, x='mean_latitude', y='altitude', ylim=(18, 50), log_y=False, figsize=(16, 6),\n",
    "                                x_label='Latitude [$^\\circ$]', y_label='Altitude [km]', pre_title='Pre 1997',\n",
    "                                post_title='Post 1997')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now we change our inflection point in the linear trend to 2000 instead of 1997, and repeat the analysis"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "predictors[['linear_pre', 'linear_post']] = download.load_linear(inflection=2000)[['pre','post']]\n",
    "\n",
    "predictors[['linear_pre', 'linear_post']].plot(figsize=(16, 6))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "results = regress_all_bins(predictors, mzm_data['relative_anomaly'], tolerance=0.1)\n",
    "\n",
    "# Convert to ~ percent\n",
    "results *= 100\n",
    "\n",
    "trends.pre_post_with_confidence(results, x='mean_latitude', y='altitude', ylim=(18, 50), log_y=False, figsize=(16, 6),\n",
    "                                x_label='Latitude [$^\\circ$]', y_label='Altitude [km]', pre_title='Pre 2000',\n",
    "                                post_title='Post 2000')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now we remove the linear terms from our predictors and tell the regression call to post fit a trend to the residuals after the year 2000"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "predictors = predictors.drop(['linear_pre', 'linear_post'], axis=1)\n",
    "\n",
    "results = regress_all_bins(predictors, mzm_data['relative_anomaly'], tolerance=0.1, post_fit_trend_start='2000-01-01')\n",
    "\n",
    "# Convert to ~ percent\n",
    "results *= 100\n",
    "\n",
    "trends.post_with_confidence(results, x='mean_latitude', y='altitude', ylim=(18, 50), log_y=False, figsize=(8, 6),\n",
    "                                x_label='Latitude [$^\\circ$]', y_label='Altitude [km]',\n",
    "                                post_title='Post 2000')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Lastly we add two orthogonal EESC terms and redo the regression"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "predictors[['eesc_1', 'eesc_2']] = download.load_orthogonal_eesc('/home/runner/work/lotus-regression/lotus-regression/test_data/EESC.txt')[['eesc_1', 'eesc_2']]\n",
    "\n",
    "predictors[['eesc_1', 'eesc_2']].plot(figsize=(16, 6))\n",
    "\n",
    "results = regress_all_bins(predictors, mzm_data['relative_anomaly'], tolerance=0.1)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The results are trickier to interpret, but we can combine the two EESC terms to find the total contribution in each bin for the EESC"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "predictors = predictors[(pd.to_datetime(predictors.index.to_timestamp()) >= mzm_data.time.values[0]) & (pd.to_datetime(predictors.index.to_timestamp()) <= mzm_data.time.values[-1])]\n",
    "\n",
    "eesc_contrib = results['eesc_1'].values[:, :, np.newaxis] * predictors['eesc_1'].values[np.newaxis, np.newaxis, :] +\\\n",
    "               results['eesc_2'].values[:, :, np.newaxis] * predictors['eesc_2'].values[np.newaxis, np.newaxis, :]\n",
    "    \n",
    "plt.plot(mzm_data.time.values, eesc_contrib[2, 40, :].T)\n",
    "np.shape(eesc_contrib)\n",
    "plt.title('EESC Contribution at 45 S, 40 km')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "With some annoying time conversions we can also find the predicted turnaround time in each bin"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "turnaround_index = np.argmin(eesc_contrib, axis=2)\n",
    "\n",
    "turnaround_times = mzm_data.time.values[turnaround_index]\n",
    "\n",
    "def to_year_fraction(date):\n",
    "    def since_epoch(date): # returns seconds since epoch\n",
    "        return time.mktime(date.timetuple())\n",
    "    \n",
    "    try:\n",
    "        ts = (np.datetime64(date, 'ns') - np.datetime64('1970-01-01T00:00:00Z')) / np.timedelta64(1, 's')\n",
    "    except:\n",
    "        print(np.datetime64(date))\n",
    "        return np.nan\n",
    "\n",
    "    dt = datetime.utcfromtimestamp(ts)\n",
    "    s = since_epoch\n",
    "\n",
    "    year = dt.year\n",
    "    startOfThisYear = datetime(year=year, month=1, day=1)\n",
    "    startOfNextYear = datetime(year=year+1, month=1, day=1)\n",
    "\n",
    "    yearElapsed = s(dt) - s(startOfThisYear)\n",
    "    yearDuration = s(startOfNextYear) - s(startOfThisYear)\n",
    "    fraction = yearElapsed / yearDuration\n",
    "\n",
    "    return dt.year + fraction\n",
    "\n",
    "turnaround_times = np.frompyfunc(to_year_fraction, 1, 1)(turnaround_times).astype(float)\n",
    "\n",
    "turnaround_times[turnaround_index == 0] = np.nan\n",
    "turnaround_times[turnaround_index == len(mzm_data.time.values)-1] = np.nan\n",
    "\n",
    "plt.pcolor(mzm_data.mean_latitude.values, mzm_data.altitude.values, np.ma.masked_invalid(turnaround_times.T))\n",
    "plt.clim(1984, 2010)\n",
    "plt.ylim(20, 50)\n",
    "plt.ylabel('Altitude [km]')\n",
    "plt.xlabel('Latitude')\n",
    "plt.title('Turnaround Time')\n",
    "plt.colorbar()"
   ]
  }
 ],
 "metadata": {
  "celltoolbar": "Edit Metadata",
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.1"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}